Skip to content

Conversation

@lucidprogrammer
Copy link
Contributor

Description

Adds @fast.maker decorator implementing MAKER reliability patterns from "Solving a Million-Step LLM Task with Zero Errors".

Enables zero-error multi-step workflows through k-threshold voting and red-flag validation.

Key Features

  • First-to-ahead-by-k voting: Configurable margin requirement for consensus
  • Red-flag filtering: Discard suspicious responses (too long, malformed) before voting
  • Match strategies: exact, normalized
  • Transparent results: Access vote counts, margins, and convergence status via last_result

Usage

@fast.agent(name="worker", model="claude-3-haiku-20240307")
@fast.maker(
    name="reliable_worker",
    worker="worker",
    k=3,                      # Require 3-vote margin
    match_strategy="normalized",
    red_flag_max_length=100,  # Discard verbose responses
)
async def main():
    async with fast.run() as agent:
        result = await agent.reliable_worker.send("Classify this")

When to Use

  • ETL pipelines with thousands of transformations
  • Code migration across many files
  • Document processing at scale
  • Any task where errors compound over many steps

Testing

8 integration tests covering voting, red-flagging, and match strategies
Example in examples/workflows/maker.py

Checklist

  • Code follows project style guidelines
  • Tests added and passing
  • Example implementation included
  • No breaking changes

@lucidprogrammer lucidprogrammer force-pushed the feature/maker-reliability-pattern branch from 1a00e74 to 6ac923d Compare December 7, 2025 16:29
Implement MAKER (Massively decomposed Agentic processes with K-voting
Error Reduction) based on the paper "Solving a Million-Step LLM Task
with Zero Errors" (arXiv:2511.09030).

Key features:
- First-to-ahead-by-k voting for consensus-based reliability
- Red-flag filtering to discard suspicious responses
- Multiple match strategies: exact, normalized, structured
- Configurable k-margin and max_samples parameters

This enables high reliability with cost-effective models by trading
compute (multiple samples) for accuracy (statistical consensus).

Includes:
- MakerAgent workflow implementation
- @fast.maker() decorator for easy integration
- Comprehensive integration tests
- Example demonstrating customer intent classification
@lucidprogrammer lucidprogrammer force-pushed the feature/maker-reliability-pattern branch from 6ac923d to 6b3f4b4 Compare December 7, 2025 16:34
@evalstate evalstate merged commit 73baea3 into evalstate:main Dec 12, 2025
6 checks passed
@evalstate
Copy link
Owner

This is a very very cool feature. Couple of quick questions - I can imagine using this quite a lot - but not read the paper yet...!

  1. I often use what I call "responders" - agents with a set of template messages that don't retain history - is it part of the design of this to use the same context, or would using responders help?
  2. Is there any overlap with using parallel as part of this too?
    I've got PR on the docs I'd appreciate if you could review. TY!

@iqdoctor
Copy link
Contributor

@lucidprogrammer
Copy link
Contributor Author

lucidprogrammer commented Dec 17, 2025

This is a very very cool feature. Couple of quick questions - I can imagine using this quite a lot - but not read the paper yet...!

  1. I often use what I call "responders" - agents with a set of template messages that don't retain history - is it part of the design of this to use the same context, or would using responders help?
  2. Is there any overlap with using parallel as part of this too?
    I've got PR on the docs I'd appreciate if you could review. TY!

Yes, "responders" are actually the ideal use case here. Since maker asks the same question multiple times to vote, you want fresh, independent attempts. If an agent keeps history, it might get confused being asked the same thing twice or introduce bias. Stateless responders prevent that perfectly.

Parallel Overlap They solve different problems, I guess. Parallel is for doing different things at once speed/throughput), while Maker is for doing the same thing multiple times (redundancy/correctness). You can use them together, but one doesn't replace the other, thats my intuition on it.

@evalstate you mentioned about a PR to review? Will you send the link, tks

@iqdoctor
Copy link
Contributor

@lucidprogrammer Let me try to respond based on my understanding.

I’ve partially addressed the child history control problem by introducing an explicit parameter:

History / parallel controls supported:

  • history_mode (controls child history):

    • scratch: clones start with an empty history
    • fork (default): clones start from the template child history; no merge-back
    • fork_and_merge: clone history is merged back into the template child agent

In all cases, the history is taken from the main instance with the same agent name as the child, not directly from the orchestrator.

I’m still thinking about whether it makes sense to add an option to pull history from the orchestrator when spawning child agents.

You’re absolutely right that, in most cases, you want fresh, independent attempts. The original paper effectively assumes no retained history and passes all necessary information explicitly at call time.

That said, I think there are legitimate cases where some accumulated history can be useful — for example, during a longer reasoning process where you reach a branching point and want to spawn N agents to vote on a decision. In that scenario, sharing a limited, relevant context might make sense.

Regarding the overlap with parallel: I suspect the question isn’t whether one replaces the other (they clearly serve different purposes), but whether the execution machinery can be shared.

From an implementation perspective:

  • launching N child tools, whether identical or different,
  • waiting for their completion,
  • collecting results,

is essentially the same core logic.
What differs is how the results are interpreted and combined (voting vs aggregation vs routing).

So I think there’s a good opportunity to reuse or unify the execution layer, while keeping the result-handling logic separate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants